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Splice site AFLP 
Field of the invention 

lOlJ. The present invention relates to a method for identifying and analyzing nucleic acid 
sequences that contain or are associated with spUce sites. In particular, the invention 
provides a method for identifying and analyzing nucleic acid sequences based upon 
polymorphisms associated with such spUce sites, a method for targeting genie regions 
based on conserved splice sites sequences and describes a method for the conversion into a 
PGR assay of the spUce site specific fi:agments obtained by the method of the invention 



Background of the invaition 
102]. Plant breeders undertake continuous efforts to create new varieties that have higher 
yields and better quaKty. In many cases, the trait to be improved requires a phase of 
vegetative growth before the actual assessment or selection can be carried put. For 
instance, if a breeder wants to select tomato fiiiits with a better shelf life or higher pigment 
content, it will take at least two months (firom the seedUng stage) before the actual 
observation of fiiiits can be made. This is of course identical in melon and even true for 
some agronomic species such as canola. In the last case, oil composition can only be 
determined when the seeds are ready for harvest. Similar considerations ^ply to other 
20 organisms such as animals, humans etc. 

[03]. In order to accelerate the identification of suitable lines having the desired 
characteristics in segregating populations, molecular biologists set up genetic marker 
systems that allow indirect selection of plants with the desired genetic composition. This 
means that in an ideal case, DNA fiom a seedling is analysed to determine whether the 
25 desired trait wiU be present in a much later stage of plant development. The traits 

envisaged are not only quaKty traits but also resistances to viruses, fungi etc. In the case of 
resistance maricers, the breeder is much less dependent on disease tests or natural 
mfestations for testing for a successful cross. The criteria that are preferably ftdfiUed in 
order to make a genetic marker suitable for the stated purpose is that the DNA sequences 
30 that are being visuaUsed are (tightly) linked to the desirable allele and discriminate 

between, in the case of resistance genes, resistant and susceptible alleles (polymorphisms). 
Visuahsation of the polymorphic DNA sequences can be done using various methods 
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known in the art such as RFLP, AFLP, RAPD, and Microsatellites etc. All these techniques 
are concerned with the ultimate goal, which is to visualise variation in DNA sequences 
from any organism that are linked to specific alleles of particular genes. The degree of 
coupling (association) of the polymorphic DNAs with the alleles of interest determines the 

5 suitability (i.e. predictive value for the trait) of the marker. The ultunate marker for a 
particular gene is the polymorphism within the particular gene locus itself 
[04]. As said herein before, various methods for analyzing nucleic acid sequences are 
known in the art, such as RFLP, AFLP, RAPD, Microsatellites etc. Often, such techniques 
involve amplification -such as by PGR- of one or more parts of the nucleic acid (s) of a 

1 0 mixture of restriction fragments generated from the nucleic acid (s). The amplified mixture 
thus obtained is then analysed, e.g. by detection of one or more of the amplified fragments. 
For example, the amplified fragments may be separated based on differences in length or 
molecular weig^it, such as by gel electrophoresis, after which the amplified Gragpients are 
visualised, e. g. by autoradiogrs^hy of the labelled amplified firagments or blottmg 

1 5 followed by hybridisation. The resulting pattem of bands is referred to as a (DNA) 
fingerprint. 

[05]. Usually in DNA fingerprinting, fingerprints of closely related species, subspecies, 
varieties, cultivars, races or individuals are compared. Such related fingerprints can be 
identical or very similar, i. e. contain a large nxunber of corresponding-and therefore less 

20 informative-bands. Differences between two related DNA-fingerprints are referred to as 
genetic markers reflecting DNA-polymorphisms in the genome. These are amplified 
fragments - i. e. bands which are unique in or for a fingerprint and/or for a subset of 
fingerprints. The presence or absence of such polymorphic fragments in a fingerprint - or 
the pattem thereof - can be used as a g^etic marker. Such a genetic marker can be used, 

25 for instance to identify a specific species, subspecies, variety, cultivar, race or individual; 
to establish the presence or absence of a specific inheritable trait and/or of a specific gene; 
and/or to determine the state of a disease. For a fiirther discussion of DNA-fingerprinting, 
DNA-polymorphisms, genotypuig, PGR and similar amplification techniques, as well as 
the techniques and materials used therein, reference is inter alia made to the prior art 

30 mentioned hereinbelow, as well as to the standard handbooks. 

[06]. One DNA-fingerprinting technique -which is advantageous in that it requires no 
prior knowledge of the sequence to be analysed- is selective restriction fragment 
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amplification or AFLP. In genwal, AFLP comprises the steps of ; 

(a) digesting a nucleic acid, in particular a DNA or cDNA , with one or more 
specific restriction endonucleases. to firagment the DNA into a corresponding series 
of restriction fragments; 

6 (b) ligating the restriction fragments thus obtained with a double-stranded synthetic 

oligonucleotide adapter, one end of which is compatible with one or both of the 
ends of the restriction fragments, to thereby produce tagged restriction fragments of 
the starting DNA; 

(c) contacting the tagged restriction fragments under hybridizing conditions with 
1 0 one or more oligonucleotide primers; 

(d) ampli^g the tagged restriction fragment hybridised with the primers by PCR 
or a similar technique so as to cause fiirther elongation of the hybridised primers 
along the restriction fragments of flie starting DNA to which the primers 
hybridised; and 

(e) detecting, identifying or recovering the ampUfied or elongated DNA fragment 
thus obtained. 

[071. The AFLp.fingerprint thus obtained provides information on sequence variation in 
(subsets of) the restriction enzyme sites used for preparation of the AFLP template and the 
nucleotide (s) immediately adjacent to these restriction enzyme sites m the starting DNA. 
[081. By comparing AFLP-fingerprints from related individuals, agam polymorphic 
fragments (also referred to as AFLP-markers) can be detected/identified, e.g. for the 
purposes mentioned hereinabove. 

[091. For a further description of AFLP, its advantages, its embodiments, as well as the 
techniques, enzymes, adapters, primers and further compounds and tools used therein, 
25 reference is made to US 6,045,994, EP-B-O 534 858, EP 976835 and EP 974672, 
WOOl/88189 and Vos et al. Nucleic Acids Research, 1995, 23, 4407-4414. 
[101. Alfliough the basic AFLP technology (basic AFLP) is a very efficient technique for 
identifymg and analyzing polymorphisms in random subsets of nucleic acid sequences. 
DNA or cDNA. basic AFLP does not contain any discriminating factor that allows the 
ampUfication of selected subsets with particular sequence characteristics. For instance, 
basic AFLP cannot distinguish between coding and non-coding regions. Basic AFLP is 
also not specifically capable of identifying AFLP markers that are associated with specific 
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or predefined sections of the genome such as those representing genie regions of the 
genome (intron and exon sequences), 

[llj. One of the objeicts of the present invention is to provide methods for the 
identification and/or analysis of nuqleic acid sequences. It is also an object of the present 
5 invention to provide for improved or alternative methods for the identification or analysis 
of splice site based or . i^Uce;site located polymorphisms, and to identify and analyze 
markers based^^ii^i^f^^ of the present invention to provide for such 

a method im^lyii]^^^||i:^ technolog3^ 

10 Description of the invention 

[121* Tlie pr^seji| i^^ P^^t?% to a method for the analysis of polymorphisms. The 
method prpvideik^^^ determination of polymorphisms that may be 

emiched^Qri^iit^^ site sequences in the genome under 

investigation;;^^ polymorphisms and fingerprints that are 

1 5 more closely linked to, or are at least indicative of, genes, in particular to polymorphisms 
and fingerprints that are more Imke^ presence of introns and exons. The method is 
based on the structural features of thje nucleotide sequence at the intron-exon boundaries. 
The method is also based on (parts of) the AFLP technology 

[13). Generally, the above-described objects of the invention are achieved by a method 
20 wherein the nucleic acid is analysed, and more in particular one or more adapter-ligated 
restriction fragments derived fiorq. .the nucleic acid are analysed, using at least one primer 
(depicted herein as spUce site;ipn^ site-specific primer or S3P primer). The S3P 

primer preferably targets; Sfplijde iite bpr^^ (intron-exon junctions, splicing junctions, 
intron-exon boundaries) .;^4-'^^ prbfete^ designed to hybridise to (and prime extension 
25 from) conserved splice site sequence motifs present in target nucleic acids. Thus, the above 
objects are achieved by amplifying the nucleic acid - and in particular one or more adapter- 
ligated restriction fragments generated from the nucleic acid - with at least one S3P-primer 
and then analysing the amplified mixture thus obtained, which is enriched for genie 
sequences/fragments. 

30 

Detailed description of the invention 
[141. In its broadest scope, the invention comprises the use of at least one S3P-primer in 
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analyzing nucleic acid sequences. In particular, the invention comprises tiie use of a S3P- 
primer in combination with another primer in analyzing nucleic acid sequences or of the 
use of two primers in analyzing nucleic acid sequences of which one primer is an S3P 
primer. More in particular, the invention comprises the use of a S3P-primer in analysing a 
5 nucleic acid sequence for (the presence or absence of) spUce site-associated 
polymorphisms/markers. 

1151. By splice site -associated polymorphism or marker is generally meant any 
polymorphism or marker that is caused by, and/or that is related to, the presence and/or 
absence of a spUce site in the nucleic acid, e.g. at one or more specific sites in the nucleic 
10 acid. Under spUce site -associated polymorphism in the present invention is also 
understood the polymorphism associated with the AFLP technology. In AFLP, flie 
polymorphism is mostly located in the recognition sites of tiie restriction endonuclease(s). 
Thus a polymorphic fragment obtained by the use of an S3P primer and an AFLP primer is 
considered a spUce site -associated polymorphic fragment, regardless of flie location of flie 
1 5 polymorphism (in the spUce site or the recognition sites of the restiiction endonuclease 
used). UsuaUy, in the invention, the presence or absence, respectively, of a polymorphism 
at such site (s) m liie nucleic acid to be analysed (or for instance the presence of a different 
spUce site at such site (s) will lead to the generation of different polymorphic fragments, 
for instance bands that correspond to amplified fragments of different size and/or length 
20 (so-called fragment length polymorphisms), 

[16]. The coding sequences (exons) of genes are frequently interrupted by non-coding 
stretches of DNA (introns). Introns are generally tianscribed as part of precursor RNAs and 
subsequently removed by a cleavage-Ugation process called spUcing. The sti^toral 
features of introns and the underlying mechanism for splicing fomi flie basis for a 
classification for different kinds of introns. The structural features for accurate spUcing are 
also found at the borders between introns and exons (the junctions). The junctions have 
well conserved, though relatively short, consensus sequence. It is possible to assign a 
specific end to every intron by relying on flie conservation of mtion-exon junctions. The 
junctions can be aligned to conform to the consensus sequence given in Figure 2A. The 
subscript in Figure 2A indicates the percent occurrence of flie specified base (or type of 
base (N= A,C,T,G, Py = pyrimidme base, Pu = purine base) at each consensus position 
among a large number of introns analysed. High conservation is found only immediate 
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within the intron at the presumed junctions. This identifies the sequence of a generic intron 
as GT.,.. AG. Because the intron defined in this way starts with the dinucleotide GT and 
ends with the dinucleotide AG, junctions are often described as conforming to the GT-AG 
rale (the actual sequences in the RNA are, of course, GU-AG). Note that the two sites have 
5 different sequences and so they define the ends of the intron directionally. They are 

generally named proceeding fi^om the left to the right along the intron, that is, as the left (or 
5') and right (or 3 ') splice sites. Sometimes fliey are called donor and acc^tor sites. The 
consensus sites are impUcated as the sites recognised in spUcing by point mutations that 
prevent spUcing in vivo and in vitro. The GT-AG rule describes the splice sites of nuclear 

1 0 genes of many (if not all) eukaryotes. This impUes that there is a common mechanism for 
splicing intron out of RNA. For introns of mitochondria, chloroplasts and other organelles, 
other consensus sequences are known and can be appUed likewise in the present invention. 
[171. Accordingly, as visuahsed in Figure 2, introns usuaUy contain at one end (the 5' 
end hwe) the dinucleotide GT and at the distal (the 3' end here) end the dinucleotide AG. 

15 This is the so-caUed GT-AG rule (see Lewin, Genes VI, Oxford university press 1 998, 

ISBN 0 19 8577788 pp. 885-920; Lewin, Genes IV, Oxford university press 1990, ISBN 0 
0198542682 page 578-609) The GT-AG rule describes the spUce sites of nuclei genes of 
many eukaryotes. On the 5' end of the intron (thus in the exon), 2 nucleotides are also 
highly conserved, in many cases 5'-AG-3'. On the 3'-side of the GT dinucleotide (thus in 

20 the intron) high conservation can be seen for a tetranucleotide 5 '-AAGT-3 ' . Taken 

togetiier tiiis means that at the 5'side of the intron and extending two nucleotides into tiie 
exon eight nucleotides can be identified. These eight nucleotides can be identified with 
high homology tiuroughout eukaryotes. It is expected in the art that within the different 
kingdoms and in particular at flie level of species, tiie degree of conservation wiU be very 

25 high. At tiie 3 ' border (in the intron), sequence conservation is also observed (see Lewin 
1998, 1990) and similar observations apply to tiie 3' end of tiie intron exon junction. 
[181. Using tiiese consensus sequences of intron exon junctions, primers can be designed 
timt allow for tiie selective ampUfication of firagments that contain tiiese consensus 
sequences. This allows tiie selective amplifications of fragments tiiat contain splice site 

30 junctions and thus for tiie selective ampUfication of gene related fragments. The subset of 
selectively amplified fragments hence comprises an increased amount of information on 
markers ttiat are directiy related to genes. This is in contrast witii tiie conventional 
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tecbnologies such as RAPD, AFUP wherein primarily anonymous (non-targeted) 
sequences and not the gene or allele of interest itself. 

[191. Various types of introns result in different splice sites having different consensus 
sequences. Examples thereof are those described in Singer & Berg, Genes and Genomes, 
5 1991, University Science Books ISBN 063202879-6 pp 556 ff. ; T. Cech, Cell 44 (1986), 
207; T. Cech, Int. Rev. of Cytology 93 (1985) 3; CR Cantor and CL Smith, Genomics, 
John Wiley & Sons, New Yoric, pp. 530 fi£; T. Brown, Genomes, 1999, Bios Scientific 
publishers ISBN 1859962017, pp. 212-219. Based on the generalised knowledge available 
in these citations as well as in numerous other publications describing splice site structures 
1 0 and consensus structures, the skilled man can design a suitable splice site-specific primer 
that can be used in the present mvention. For the identification of putative splice sites in 
sequraices, there are also various conq)ut6r programs available such as firom NetPLAntgene 
[htlp://www.cbs.dtu.dk/services/NetPGene]; BDGP 

[http://www.fiiiitfly.org/seq_tools/spUce.html]; and Genio [http://genio.informatik.uni- 

'15 stuttgart.de/GENIO/spUce/]. 

[20J. Table 1 discloses various types of spUce sites with locations and/or consensus 
sequences. The depicts a spUce site, W= A or T; M= A or C; R= A or G; Y= C or T; K= 
G or T; S= G or C; H= A, C or T; B= C, G or T; V= A, C or G; D= A, G or T; N= A, C, G 
or T and n indicates multiple nucleotides. This nucleotide nomenclature is used thiou^out 

20 the appUcation. Invariant bases (consensus sequence) are underlined. The bases are shown 
as they occur in RNA. 



Table 1. SpUce sites: 



Intron type 


5* SpUce 
junction 


Near 3' spUce 
junction 


3' spUce 
junction 


"Where found 


GU-AG 


CRG^GU(A/G) 
AGU 


A 


YnAG^N 


Nuclear Pre- 

mRNA 

(general) 


GU-AG 


^GUAUGU 


UACUAAC 


YnAe^N 


Nuclear Pre- 

mRNA (yeast) 


TRNA 


N'^N 




N^N 




Group I 


IT 




G^ 


Nuclear rRNA, 
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mitochondrial 
mRNA and 
rRNA, 
organelle 
RNAs, bacterial 
RNAs 


vjxuup JJ. 






YnAU^ 


Mitochondrial 

inRNA, 

organelle 

RNAs, 

prokaryotes 


Group m 








organelle RNAs 


ChloToplast 

(Euglena) 


'"GUGrC/UiG 




YnAU^ 




Pre-tRNA 
introns 








Eukaryotic 
nuclear Pre - 

TRNA 


Archael introns 








Various RNAs 



[21J. Oflier introns such as twintrons (i.e. introns within introns, such as described in D. 
W. Copertino and R. B. Hallick, "Group II and group III introns of twintrons: potential 

5 relationships to nuclear pre-mRNA introns". TIBS (1993) 18:467-471 ; or. R. G. Drager 
and R. B. Hallick, "A complex twintron is excised as four individual introns". Nucl. Acids 
Res^ (1993) 21:2389-2394 are also within the scope of the invention. 
122]. The meliiod of the invention is schematically illustrated in the non-limiting Figure 
1 . In Figure 1, the S3P-primer is indicated as (1), the nucleic acid to be amplijaed - also 

0 referred to hereinbelow as the target DNA - is indicated as (2). The (sequence of) a splice 
site present in/on the target DNA is indicated as (3), with the intron part of the splice site 
as (3 A) and Ihe exon part of the splice site as (3B). As schematically shown in Figure 1, 
the S3P-primer (1) is (intended to be) complementary to that part of the sequence of the 
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target DNA (2) that at least comprises a part of the splice site sequence (3). Preferably the 
S3P-primer comprises tiie jmiction of the splice site (between 3 A and 3B) , so as to allow - 
e. g. dxiring amplification - the extension of the S3P-primer (1) in the 3'-direction along the 
target DNA (2), which serye$>s a template for the extension of the S3P-primer (1). As also 
5 schematically shown in Figure 1, the S3P-primer (1) may be considered to comprise 
essentially two parts, i.e. a 3*-part aUd a 5'-part, indicated in Figure 1 as (4) and (5), 
respectively. P^ ()^^^^||||Bi^ jS3P^^ (1) is (intended to be) essentially 
complementary tif '''(ff0$0^^^l^^-of) iihe intron (3A), more in particular to part of the 
consensus sequence of tti iiitriin section. The 5'-part (5) of the S3P-primer (1) is (intended 
10 to be) conq>lemCTtary tp the ex6ii^ more in particular to the consensus sequence of the exon 
sequence (3B).j j^^^; . ^^^^^ . . . 

(231. Ilie S||||i^|||l)v^ atleast such that, when (the sequence of) a splice site 
(3) to W^|li-^^g|^||ni|ipr^^^^^ is present in the target nucleic acid, it is 

cap;d)le 6f ty^lS^f wijii^'M^ as to allow extension of the S3P-primer 

1 5 (1) along the target nucleic acid (2). The skilled person will understand that the S3P primer 
comprises sequence thatniay be complementary to spUce site sequences either in the sense 
strand or in the noursense strand. Usu^ly, a S3P-primer (1) used in the invention will 
contain a total of between 8 and 20 nucleotides, and in particular between 12 and 16 
nucleotides. Of these, between 2 and lO nucleotides, and in particular between 4 and 8 

20 nucleotides, preferably bj^ee^ 5 and 7 nucleotides will form part of the 3'-part (4) of the 
S3P-primer (1) i.e. the patt lliat. w pqji^l^ to (the intron-derived motif of) the splice 
site. Betwe^ 4 and io^iji^l^tide^ a^^ 6 and.8 nucleotides will form 

part of the 5'-part (5) of lJitei^|i^ip^(Br (1), i.e. Ae exoh region. The S3P primra- is a 
primer comprising a cbhse^d(|;^lic0 site border sequence or at least part of a consensus 

25 sequence of a splice site borda: sequence, preferably at least 50 % of the consensus 
sequence, more preferably at least 60 %, 70%, 80%, 90%, in particular 95% and most 
preferably 100% of the consensus sequence. In one embodiment the ^lice site specific or 
S3P primer is a primer that cpmprises a section that is derived firom the GT...AG consensus 
motif of the intron and is capable of annealing to part of the GT.... AG consensus motif, 

30 preferably in combination with one or more of the consensus niicleotides of the exon, as 
depicted m Figure 2A. In a prefetted embodiment, the S3P primer comprises at least the 
oUgonucleotide firagment GT. More preferably, the S3P primer comprises at least the 
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oligonucleotide ftagment: X1X2GT wherein X stands for A, C, T, or G. Preferably Xi is A. 
Preferably X2 is G. In a further preferred embodiment the S3P primer comprises at least 
the oligonucleotide fragment: GTX3X4X5X6 wherein X stands for A, C, T, or G. Preferably 
X3 is A. Preferably X4 is A. Preferably X5 is G. Preferably Xs is T. 

5 [24]. In a more preferred embodiment the primer comprises the oligonucleotide fragment 
X1X2GTX3X4X5X6. It is noted that Xi through Xs (also depicted as the variable 
nucleotides) can be selected independently from each other, thus a primer comprising the 
fragment ANGTNNNT is within the scope of the present invention. A preferred splice site 
primer contains at least 4 nucleotides selected from amongst the generalised consensus 

1 0 sequence AGGTAAGT, more preferably 5 nucleotides, particularly preferred 6 

nucleotides, more particularly preferred 7 nucleotides. This means that a primer according 
to this embodiment can comprise, for example, the foUowmg structure: ANGTNNGT or 
NNGTNNGT. Most particularly preferred is (hat the primer contains the generalised 
consensus sequence AGGTAAGT. 

1 5 [25]. Based on the guidelines for primer design and variations therein as outlined herein 
above, the skilled man can design splice site specific primers for splice sites having other 
consensus sequences, based for instance on the consensus sequences outlined in Table 1 or 
otherwise identified in the literature. 

[26]. As the spUce sites may be present in a degenerated form in a genome, it may be 
20 useful to use a set of S3P primers. Such a set comprises more than one S3P primer. 

Preferably, a set comprises two, three, four, five, six, seven, eight, nme or ten different S3P 
primers. Preferably, such a set comprises more than 10, 20, 30, 40 or even more than 50 
different S3P primers. Each primer in a set differ in at least one, preferably two more 
preferably three and most preferably four or more of its nucleotides from the other primers 
25 in the set. Each of the primers in the set follows the general structure for S3P primers as 

outlined herem before. Such a set is advantageously if, for instance for a certain species the 
consensus sequence is not known or contains more variation than usual. 
[27]. Within such a set of S3P primers, the primers can be independently varied and the 
variable nucleotides can be selected at will. Preferably, the variation within the S3P primer 
30 set can be provided by variation of Xi-Xg between two primers in the set. For example, for 
a first primer in the set Xi is A and for the second primer in the set Xi is T, where the 
remaining sequence of the primer is identical for the first and second primer. 
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ps]. One of the preferred spUcing sites of the present invention contains an average 
structure that can be depicted as A(64)G(78) in the exon; 
G(100)T(100)A(62)A(68)G(84)T(63) on the 5' end of the intron; 
12PyNC(65)A(100)G(100) on the 3' end of the intron. The numerical values indicate the 

5 percent occurrence of the specified base or type of base at each consensus position of the 
splice site. This means that throughout a set of splice sites, whether from different 
organisms or genomes, the consensus sequence of the splice site is a statistical average. 
The value 100 means that each splice site contains that nucleotide, or in other words, that 
specific nucleotide has an occurrence of 100% at that positioning this type of splice site. A 

1 0 value less than 100 %, for instance 62% means that there is variation at that position in that 
type of splice site amongst a wide number of ^lice sites of that type that have been 
investigated. From that statistical average, the most common nucleotide has an occurrence 
of 62%. The complementary percentage is distributed amongst the other nucleotides. Thus, 
A(62) means that, on average, 62 % of the splice sites in contains an A at that particular 

1 5 position of the splice site. The other 38 % can be distributed amongst C, G or T. The 
percentages may differ when specific genomes are targeted and can be adjusted 
accordingly. Sets of S3P primers may be synthesised that are based on the above average 
structures of splice sites. Thus a preferred set of S3P primers based on the average 5' intron 
structure may have a composition whereby in the first (i.e. most 5') position of the section 

20 that corresponds to the splice site sequences in the primers in the set 64 % of the 

nucleotides are A and 12% of the nucleotides are G, 12% of the nucleotides are T, 12% of 
the nucleotides are C, in the second position 78% of the nucleotides are G, 7.33% of the 
nucleotides are A, 7.33% of the nucleotides are T and 7.33% of the nucleotides are C and 
so on for the fiuHier positions in the S3P primer set. A preferred set of S3P primers based 

25 on the average 3* intron structure thus may have a composition whereby in the first 12 (i.e. 
most 5') positions of the section that corresponds to the splice site sequences in the primers 
in the set a 50/50 mixture of pyrimidines is present, in the 13* position equal amounts of 
A, G, C and T are present, in the 14* position 65% of the nucleotides are C and 1 1.66% of 
the nucleotides are A, 1 1.66% of the nucleotides are G and 1 1.66% of the nucleotides are 

30 C, and so on. In these composition of S3P primers the percentages indicated above are only 
used as examples and may be adapted by the skilled person, in particular the percentages of 
the three different minority nucleotides are not necessarily the same. A preferred S3P 
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primers based on the average 3' intron structure has a composition whereby in the first 12 
positions nucleotide analogues are present that contain degenerate bases mimicking 
pyrimidines, i.e. a C/T mix, or mimicking purines, i.e. an A/G mix, in its complement. 
Such nucleotide analogues contains e.g. the P and K bases (mimicking respectively 

5 pyrimidines and purines) as described by Kong and Brown (1989, Nucl. Acids Res 17: 
10373-383; 1992, Nucl. Acids Res 20: 5149-52). The length of the sections in the S3P 
primers having the sequence based on the average 5' and 3' splice site stmctures or their 
complements are as described above. It is preferred that when a set of S3P primers is 
designed, the average composition of the splice site specific parts of the primers 

1 0 corresponds to this distribution, or at least for 70, 80 or 90 %. It is noted that when odier 
splice sites are targeted similar sets of primers can be designed taking into account the 
average composition of the splice sites throughout a genome of interest. 
[29], It is noted that the S3P prim^ of the present invention may comprise other 
nucleotides than the nucleotides that are part of the consensus sequence of the splice site. 

1 5 These nucleotides can be located at any position in the S3P prim^ (and not just at the Xi- 
X6 positions) that does not form a part of the GT or AG consensus sequence such as the 3' 
end or at the 5' end or between sections of the consensus sequence. Alternatively, one or 
more of the nucleotides Xi_6 of the consensus sequence may be replaced by so called 
universal nucleotide analogues such as inosine, and/or they may contain LNAs, PNAs etc. 

20 [30], The splice site can be approached by two routes, i.e. orientations, (exon-to-intron or 
intron-to-exon) and two different primers can be designed accordingly. The skilled person 
will know that in the exon-to intron orientation the S3P primer has a sequence that is 
complementary to the sense strand of the splice site sequence and that in the intron-to-exon 
orientation the S3P primer has a sequence that is complementary to flie nonsense strand of 

25 the sphce site sequence. The use of these two different primers may lead to different 
(fingerprinting) results and hence to the determination or identification of difTer^t 
polymoiphisms. Both types of primers (exon-to-intron or intron-to-exon) may be present in 
a set of S3P primers, 

(31]. The S3P primer can be elongated by conventional elongation techniques that may 
30 lead to linear or exponential amplification of the restriction firagment. Examples thereof are 
Strand Displacement Amplification (SDA), etc. Preferably, an exponential amplification 
technology is used. In particular, an amplification technique based on the polymerase chain 
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reaction ^CR) is used. PCR commonly employs at least two primers. In the present 
invention, this means that, when PCR is used, additional to the S3P primer a second or 
further primer is used. The second or further primer may be a random primer or a primer 
tiaat is directed against a specific target sequence such as a (retro)transposon, an NBS 

5 region, a microsatellite, or a second splice site. Alternatively, the primer can be directed 
against other conserved sequences present in the intron. An example thereof is the 
conserved region where, during the transesterification reaction the hydroxyl attached to the 
2' carbon of the adenosine promotes the reaction to form the lariat structure. In yeast, such 
conserved regions are known as TACTAAC regions. Alternatively, the primer can be 

1 0 directed against the adapter of adapter-ligated restriction fragments such as an AFLP 
primer. Preferably, the second primer is a second S3P primer or an AFLP primer, more 
preferably an S3P primer. 

[32]. In an ^temative embodiment of the invention, one or more of the primer used to 
analyze the nucleic acid sequence(s) can be associated with or is directed against specific 

1 5 target sequence such as a (retro)transposon or an NBS region. This primer can be 
combined with one or more of AFLP primers, S3P primers, or random primers. 
[33]. In a particular preferred embodiment of the preset invention, the second primer 
used in the PCR ampUfication is an AFLP primer. The combination of a S3P primer with 
an AFLP primer as the second primer is used here to illustrate the principle of the 

20 invention. It is explicitly noted that as the second primer any of the abovementioned 

second primers can be used without departing from the gist of the invention. The prior art 
does not describe or suggest a method for analyzing splice site associated polymorphisms 
or markers mvolving the use of both a S3P-primer and an AFLP-primer. 
[341. The AFLP-primer used in the invention, indicated as (7) in Figure 1, is essentially 

25 the same as a conventional AFLP-primer, in that it is (at least) complemCTLtary to (the 
sequence of) an adapter, indicated as (8) in Figure 1, that has been linked to the target 
DNA (2), so as to allow - e. g. during amplification - the extension of tiie AFLP-primer (7) 
in the 3'-direction along the target DNA (3), which serves as a template for the extension of 
the AFUP-primer (7). Most preferably, as in AFLP, the primer contains, at its 3'-end, a 

30 number of so-called selective bases/nucleotides-indicated as (9) in Figure 1-that are 

(intended to be) complementary to (same number of) bases/nucleotides tiiat, in the target 
DNA (2), are directly adjacent to the 3' end of the adapter (8). Using the S3P-primer (1) 
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and the AFLP-piimer (7), the target nucleic acid (2) is amplified, e. g. as indicated by the 
anx>w8 in Figure 1. Jn particiilar, during the amplification, the S3P-primer (1) will be 
extended along one strand of the (double stranded) target DNA (2) and flie AFLP-primer 
will be extended along the oiOier strand of the (double stranded) target DNA (2), e. g. so as 
5 to allow for efficient/exponential amplificatioa The mixture of amplified 

products/fragments thus obtained may then be analysed, e. g. by detecting/visualizing at 
least one, andjug^,^,;^^|^tii|lj)^d^ e. g. as fiirtho: 

described herdii^^ . 

[351. In one aspect; the mVentioii relates to the use of (the combination of) an S3P-primer 
1 0 and an AFLP-primer in sunpU^nlng a niicleic acid sequence Qierein also referred to as the 
target nuclejc, 2«jid|^^^ this aspect bf the invention, and with refer^ce to Figure 1, the 

nucleac ii^d?^ej||^^|fe)^^ in Figure 1, to which the adapter has been ligated. 

hi piartii^uj^'l^^ the further micieic add sequrace (1 1) present 

15 in the target nucleic acid (2) m^y be a restriction feagment. For instance, the further nucleic 
acid (1 1) may be a reistildlioii fragm derived from a starting DNA-including but not 
limited to genomic DNA, or recofnbinaht DNA such BAC DNA, cosmid DNA or plasmid 
DNA-by restriction with a rqstriction endpnuclease (as further described hereinbelow), 
although the invention in itis broadest sense is not limited thereto. In addition, the target 

20 nucleic acid (2) will usually be a DNA sequence, and in particular, a double stranded DNA 
sequence, although the invention in its broadest sense is again not limited thereto. 
[361. The target nucl^c acid (2) logiay comprise a single adapter (8) but usually comprises 
two ad^ters (8), e.g. each figatie^ ito one end of the restriction firagment (1 1) present in the 
target nucleic acid, in adchtiooC wi^^^^ adapters (8) are present, they may be the same or 

25 different 

1371. The target nucleic acid (2) may be part of a mixture of such target nucleic acids. 
For instance, when the target nucleic acid comprises a restriction fi-agment ligated to an 
ad^ter, it may be part of a mixture bf such adapter-ligated restriction fragments. Such a 
mixture may for instance be obtained by ligating a adapter to a mixture of restriction 
30 fragments, which may be carried out in a manner known per se, for instance as described in 
the prior art, including but not limited to BP 0 534 858. 

[38], Optionally, such a mixture of target nucleic acids may (already) have been 
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subjected to a (pre) amplification step, i. e. prior to the amplification with the S3P-primer 
and the AFLP-primer. For instance, when the target nucleic acid (s) (2) contain two 
adapters (8), such a pre-ampUfication may be carried out as a conventional AFLP-pre- 
amplification i.e. using +0/+0 AFLP primers, for which reference is made to BP 0 534 858 

5 and Vos et al, cited herein. This pre-amplification may also have been a selective pre- 
amplification for reducing the complexity of the mixture i.e. using' +n+m AFLP primers, 
wherein n, m are integ^, independently ranging ftom I tolO. 
[391. The target nucleic acid (2) preferably also comprises (or is at least suspected to 
contain) at least one splice site, or otherwise the target nucleic is at least part of a mixture 

10 of such target nucldc acids of which a target nucleic acid comprises (or is suspected to 
contain) a splice site; and in particular a splice site to which the S3P-primer (1) can and/or 
is intended to hybridise. 

[401. The AFLP-primer (7) will be essentially the same as a conventional AFLP primer, 
e.g. as described in EP 0 534 858, and will generally contain a constant region indicated as 

1 5 (10) in Figure 1-and one or more selective nucleotides in a selective region (9) at the 3'-end 
thereof. In addition, the AFLP-primer (7) is most preferably essentially complementary to 
at least one of the adapters (8) used, e. g. so as to allow extension of the AFLP-primCT (7) 
along the target nucleic acid (2). Prefaably, the AFLP-primer (7) will contain a total of 
between 15 and 50 nucleotides, and m particular between 18 and 30 nucleotides. Also, 

20 preferably, the AFLP-piimer (7) will contain between 0 and 6, preferably 1 or 2 or 3 or 4 
selective nucleotides. 

[411. The amplification of the target nucleic add (2) with the S3P-primer (1) and the 
AFLP-primer (7) may be carried out under conditions known per se, including but not 
limited to conditions known per se for amplifications in general or using conditions known 

25 per se for amplification using AFLP-primers. Such conditions are for instance described in 
the above-mentioned prior art (e.g. EP 0 534 858 for AFLP-primers) and some non- 
limiting examples of suitable conditions are given in the Experimental Part hereinbelow. It 
is envisaged that based upon these disclosures, the skilled person will be able to select (a 
range of) optimal conditions for the amplification of a given (mixture of) target nucleic 

30 acid (s) with a given combination of S3P-primra: and AFLP-primer. 

[42]. Preferably, the amplification is carried out using only one S3P-primer as described 
above and only one AFLP-primer as described above, although the invention in its 
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broadest sense is not limited thereto. 

[43]. In addition, the S3P-prmier and the AFLP-primer are preferably such tiiat they 
allow for efScienl/exponential amplification. In this respect, it should be noted that, when 
the target nucleic acid is part of a mixture of such target nucleic acids, in the amplification 
5 step usually more than one of the target nucleic acids that are present is the mixture will be 
amplified, i. e. to provide a mixture of amplified firagments. 

[44]. After the amplification with the S3P-primer and the AFLP-primer, the amplified 
nucleic acid thus generated is detected. For instance, when the amplification step has 
provided a mixture of amplified firagments as described hereinabove, one or more-and up 

10 to essentially all-amplified firagments present in the mixture may be detected. The 

detection may be carried out using any technique known per se for the detection of an 
amplified nucleic acid/firagm^t and/or for analyzing a mixture of amplified nucleic 
acids/fi'agments. Suitable techniques are described in the abovementioned art and for 
instance include techniques in which the amplified firagments are separated and visualised 

15 (p- & (c^illary) gel electrophoresis and autoradiography to provide a fingerprint); (other) 
detection techniques based upon the mass and/or the size of the amplified firagments; and 
techniques involving the hybridisation of one or more of the amplified firagments to a 
complementary nucleotide sequence (in which the complementary nucleotide sequence 
may for instance be immobilised on a suitable carrier, e. g. as part of an array of such 

20 nucleotide sequences) followed by detection of such hybridisation events. Generally, these 
detection techniques will be such that they allow for the detection of polymorphisms, as 
fiurther described below. The invention is not limited to the use of one primer that 
selectively amplifies 'gene-lie' sequences by targeting to intron-exon junctions and a 
AFIP primer, but may also be by using two splice site specific primers. 

25 [45]. In another aspect, the invention relates to a method for analyzing a nucleic acid 

sequence, the method at least comprising the steps of: (a) amplifying a restriction fragment 
generated from the nucleic acid to be analysed, in which the restriction firagment has been 
ligated to a adapter, with one or more S3P-primers and/or an optional AFLP-primer to 
provide an amplified nucleic acid sequence; and optionally comprismg the fiirther step of: 

30 (b) detecting at least one of the amplified nucleic acid sequences thus obtained. 

[46]. More specifically, this aspect of the invention relates to a method for analyzing a 
nucleic acid sequence, the method comprising the steps of: 
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(a) restricting the starting nucleic acid with a restriction endonuclease to provide a 
mixture of restriction fragments; 

(b) ligatmg tibe restriction fragments tiius obtained to an adapter; 

(c) amplifying the mixture of adapter-ligated restriction fragments thus obtained 
5 with one or more S3P-primers, preferably one S3P primer and an optional 

second primer, preferably an AFLP-primer to provide a mixture of amplified 
restriction fragments; and 

(d) optionally, detecting at least one of the amplified restriction fragments thus 
obtained, 

10 In the above aspects of the invention, the (starting) nucleic acid is preferably a DNA 
sequence, more preferably a double stranded DNA sequence. 

[471, hi particular, the starting nucleic acid can be a nucleic acid that contains (or is at 
least suspected to contain) a splice site to which the S3P-primer used can and/or is 
intended to hybridise. For instance, the starting nucleic acid sequence can be genonuc 

1 5 DNA, and in particular eukaryotic genondc DNA, or (a mixture or a library of) 

recombinant DNA clones, e. g. derived fix)m a plant, animal or a human. For instance, flie 
starting nucleic acid can be derived from agronomically important crops such as wheat, 
cucumber, melon, barley, maize, tomato, pepper, lettuce, rice, soybean etc.; from animals 
such as such as mouse, rat, pig, chicken, fish, etc.; and/or from humans. 

20 [481. hi the restriction step a), the starting nucleic acid is restricted with a restriction 
endonuclease, which may be any suitable restriction endonuclease, such as a Type II or 
Type lis, including but not Umited to those mentioned below, 
[49] . In particular, the starting nucleic acid may be restricted with two different 
restriction endonucleases. For instance, the starting nucleic acid may be restricted witti a 

25 fi:equent cutter restriction endonuclease, which serves the purpose of reducing the size of 
the restriction fiagments to a range of sizes that are amplified efficientlj^ and a rare cutter 
restriction endonuclease, which serves the purpose of targeting rare sequences. For both, 
reference is made to for instance EP-A-0 534 858 and EP-A-0 721 987 by applicant, 
incorporated herein by reference. The skilled person will understand that the recognition 

30 sequence of a frequent cutter usually has no more than four bases that provide selectivity, 
whereas a rare cutter usually has at least six selective bases in its recogmtion sequence. 
However, wheflier a given enzyme functions as a rare or a frequent cutter also depends on 
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the base composition of its recognitioii site and the overall base composition of the sample 
DNA to be digested. Thus, a four-cutter with only G's and Cs in its recognition site may 
act as a rare cutter on AT-rich DNA. Therefore, a frequent cutter is understood to be a 
restriction enzyme that upon restriction of a given sample DNA produces restriction 

5 fragments the majority of which is less than 1 kb in length, whereas the majority of 
fragments produced with a rare cutter is larger than 1 kb in length. 
ISO], Some non-limiting examples of suitable frequent cutter enzymes are Msel, TaqI, 
and Mbol (Sau3A). Some non-limiting examples of commercially available rare cutters are 
PstI, Hpan, Mspl, Clal, Hhal, EcoRI, EcoRH, BstBI, HinPl, MaeH, Bbvl, PvuH, Xmal, 

1 0 Smal, Neil, Aval, HaeH, Sail, Xhol and PvuH, of which EcoRI, PstI, Hpall, Mspl, Clal, 
EcoRU, BstBI, HinPl and MaeH are preferred. Preferably, restriction enzymes are used 
that produce sticky ends, to facilitate ligation of adapters. In case a combination of two 
different restriction enzymes is used, preferably not more tiian one of them is a blunt 
cutter. When blunt cutters are used, the adapters to be ligated either are to be modified by 

15 the use of a helper oligonucleotides to ligate a single stranded adapter or by the use of 
double stranded adapters. 

[SIJ. After the restriction step a), the restricted fragments thus obtained are Ugated to an 
adapter. This adapter will be essentially the same as the adapter (s) used in conventional 
AFLP, for which reference is again made to the prior art relating to AFLP mentioned 

20 above. As in conventional AFLP, the adapter used is preferably such that it is suitable for 
use with at least one of the restriction enzymes used in the restriction step a). For instance, 
when the starting DNA is restricted with two restriction endonucleases (e. g. a frequent 
cutter and a rare cutter) preferably also two adapters are used, each suitable for use witibt 
one of the restriction endonucleases. 

25 [52]. The method associated with the present invention can also, be performed using at 
least one restriction endonuclease, at least one adapter and at least one AFLP primer in 
combination witii an S3P primer. The endonuclease is preferably a frequent cutter. 
[531, After the adapter has been ligated to the restriction fragments, the mixture of 
adapter-ligated restriction fragments thus obtained may then (directly) be amplified in step 

30 c) wifli the S3P-primCT and the AFLP-primer. Alternatively, the mixture can be amplified 
using one or more S3P primers, and^or optionally an AFLP primer. As aheady indicated 
above, prior to the amplification step c), the (adapter-ligated) restriction fragments may 
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first be subjected to a pre-ampUfication using one or two AFLP primers or one or more 
S3P primer, and in particular a selective pre-amplification for reducing the complexity of 
the fragment mixture. For instance, such a selective preampUfication may be carried out 
analogous to a selective pre-aii![plifidation known per se from AFLP. for which reference is 
again made to the prior art related to AFLP mentioned above. Alternatively, such a 
preampUfication may be perfermed suing one or more splice site-specific primers. 
[541. More^^^^^^j^i^:s^a),th^ 

step described lai^i^^^g^^^ es^ally the same manner as ttie restriction, 
ligation and ampUfication ^s of AFLP methodology, e.g. according to 

known AFLP protocols. The subsequent amplification step c) of the adapter-Ugated 
restriction Sand the AFLP-primer may be carried out as 

^^^^^^^^^MM^^^^V^ 1. in which the ad^rtei>Ugated 
restricfi^fr|^^^^^ak^:^H„id^c a^ 

[55]. Tliiwe^^i|^^pll)g^?^^ is analysed, which is also carried 

15 out as described hi^«inaboye, GmdraXiy, these detection techniques will be such that they 
aUow for the detection of pplytnorpliisms, e. g. detectable signals that are unique for the 
starting nucleic acid. For in^taiice, such a unique detectable signal may be a unique band in 
a fingerprint or a unique hybridisation event/signal on an array ; or the lack of such a band 
or hybridisation sigiial. For this purpose, tiie detectable signal (s) generated for a specific 

20 starting nucleic acid will usually be compared to tiie detectable signal (s) obtained for one 
or more related startmg:nucl^g..a^^^ the same conditions (e. g. the use 

of tiie same restiictipji euz^^ei^^^ preamplification (if any), S3P-primCT, AFLP- 

primer and detection tec^qii^)^fo^^^ by compaing fingerprints and/or 

hybridisation sigjials/p|^^ .^ini^^^ Such related starting nucleic acids may for 

25 instance have been d^v^.f^ this saiie individuals and/or from one or more closely 
related individuals (e. g. fi»m the same family, genus, species or even variety). For 
instance, one or more such related starting nucleic acids may be used/incorporated as 
reference sample (s) in the mefliod of the invention, in which case the results for the 
starting nucleic acid sequence arid the reference sample (s) may be directly compared. 

30 Alternatively, the results obtained for a given starting nucleic acid may be compared to 
results generated earlier for one or more related nucleic acid sequences, which may for 
instance be peirt of a database. 
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[56]. Again, such detection techniques and techniques for analyzing/comparing the 
results obtained will be essentially analogous to the techniques used to analyze the results 
obtained using AFLP, for which again reference is made to the prior art related to AFLP 
mentioned above. Thus, from the above, it will be clear that the method of the invention 
5 may conveniently be carried out analogous to a conventional AFLP amplification, in which 
for the main amplification (as opposed to the pre-amplification) a combination of one 
AFLP primer and one S3P-primer is used, instead of two AFLP-primers. Alternatively the 
amplification can be perfomied using one or more S3P primers, optionally in combination 
with one or more AFLP primers. 
1 0 [ST], Also, as in conventional AFLP, the selective nucleotides (9) of the AFLP-primer 
(7) may be selected arbitrarily or randomly. Also, the S3P-primer (1), may comprise, in 
addition to the nucleotides that are part of the consensus sequence of the splice site, 
nucleotides (6) that are not part of the consensus sequence of the splice site. These 
nucleotides can be located adjacent to one or both sides of the consensus sequence or can 
15 be located intermittently in the consensus sequence in case the consensus sequence is not 
consecutive. These nucleotides can be randomly selected or can be purposively selected. 
When randomly selected, tiiese nucleotides provide the same selective function as the 
selective nucleotides of the AFLP primers, i.e. the reduction of the number of fi*agments to 
be amplified, to create a subset of amplified splice site related adapter ligated restriction 
20 fragments. When purposively selected the selective nucleotides provide the selectivity that 
may be used to specifically select the amplification/detection of a predetermined splice site 
polymorphism, i.e. a splice site of which not only the sequence is known but also the 
intermittent or adjacent sequence is identified. The randomly chosen nucleotides in the 
spUce site primer can also be selected such that groups of splice site-specific prim^ are 
25 formed. These effects may also occur due to the natural degeneration of the primers. The 
S3P primers in such group are selective for a group of specific splice sites that in addition 
to the consensus sequence comprises a frirther set of selective nucleotides. This provides 
for an additional possibiUty to selectively amplify subsets of splice site related adapter 
ligated restriction fragments. It is also possible to include non-selective nucleotide 
30 analogues such as inosines, and the like to provide for reduced selectivity or to avoid 
certain degenerate positions in a splice site. 

[581. One of the advantages of the present method is that, as with conventional AFLP, 
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the method of the mvention does not require any prior knowledge of the sequence to be 
analysed, nor the use of any specifically designed primers, apart from the spUce site- 
specific part. In addition, the invention may allow the detection of splice site -associated 
polymorphisms/markers in conjunction with AFLP-markers, and thus provide a very 
5 powerful (combined) technique for the analysis of a starting nucleic acid for both these 
types of highly informative genetic markers. However, as with conventional AFLP, it may 
be that some (combinations of) specific AFLP-primers and S3P-primers may provide, for a 
specific starting DNA, more informative results (e.g. more polymorphic fragments) than 
other combinations, and that some combinations may even provide no informative results 

10 at all. Nevertheless, based upon the disclosure herein, the skilled person will be able to 
provide one or more suitable combinations of a S3P-primer and an AFLP-primer for 
analyzing a specific starting DNA according to the method of the mvention, optionally 
after some preliminary experiments and/or a limited degree of trial and error. 
[59]. In principle, the method of the mvention can be used for any application for which 

1 5 a spUce site associated polymorphic marker can be developed or used. Such apphcations 
include, but are not limited to, genotyping, genetic mapping, genetic profiling and DNA- 
identification techniques, e.g. to identify a specific species, subspecies, variety, cuWvar, 
race or individual, to estabUsh the presence or absence of a specific inheritable trait and/or 
of a gene; or to determine the state of a disease. 

[601. The invention can also be used for removing band patterns from fingeiprints that 
have been caused by amplification of chloroplast sequences, due to ttie differences 
betwerai splice site sequences of nuclear coded and plastome gaies 
(611. Generally the methods of the invention may provide the advantages of; 

- efficient targeting of a large proportion of splice sites present m the genome; 

- flie provision of more direct mformation pertaining to coding regions of the 
genome and consequently of markers that may be more closely Unked to genie 
regions or traits of intwest; and 

- highly reproducible fingerprint patterns due to excellent reproducibility of the 
AFLP technique compared to other techniques. 

30 [621. According to the invention, one or more of the spUce site-associated markers 
identified using the method of the invention may be (fiirther) developed mto a classical 
PCR-test. This may for mstance be carried out by a method as schematically illustrated in 



20 
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fhe non-limiting Figure 4 

[63]. In one aspect, the present invention accordingly pertains to a method for the 
determination of PCR-primers, the PCR-primers, preferably detennined by the method, the 
use thereof in the development of a PCR-assay and to the use of (a combination of) one or 
5 more S3P primers and at least one AFLP primer in the development of PCR-primers* More 
in particular, the present invention provides a method for the development of PCR-primers 
that are suitable for use in a conventional PCR-test. 

[64]. The present invention provides technologies that allow for the conversion of the 
splice site associated markers into primers that can be used in a conventional PCR test. The 
1 0 present invention also provides for PCR-primers, based on AFLP technology associated 
with splice sites. Further, the invention provides for primers that can be used in assays 
based on PCR technology. 

[65]. Generally, this method involves the identification of a splice site-associated 
polymorphic firagment, e.g. as described hereinabove. This polymorphic fragment (1 1) 

1 5 (e.g. a fragment amplified using the combination of a S3P-primer and an AFLP primer for 
the first restriction enzyme used for AFLP template preparation and optionally one or more 
alleles thereof) is then isolated (e.g. cut out of the gel obtained after gelelectrophoresis) and 
sequenced (step 1 in Figure 4). Based upon the sequence of the polymorphic fragment (s) 
thus obtained, a suitable PCR-primer is selected/designed from the sequence flanking the 

20 splice site sequence at ttie 3' end. Next, this PCR primer, in combination with an AFLP 

primer corresponding to the second enzyme used for AFLP template preparation is used to 
amplify a fragment that contains the splice site and the 5' flanking sequence, which is not 
included in the polymorphic firagment initially chosen for sequencing (step 2). From ttiis 5' 
flanking sequence a suitable second PCR primer is selected/designed (step 3), which 

25 together with the first PCR primer matching the 3'flanking sequence is used m a 
conventional PCR-detection, e. g. on a starting DNA (step 4). 
[66], In one aspect, the invention pertains to a method for providing a PCR primer 
comprising the steps of identification of a spHce site-associated polymorphic fragment 
amplified by the combined use of a S3P primer and an AFLP primer for the first restriction 

30 enzyme used for AFLP template preparation, sequencing the fragment, designing and 

synthesizing a first PCR-primer for the sequence flanking the splice site sequence at the 3* 
end; optionally amplifying a fragment comprising the splice site and at least part of the 5*- 
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flankiiig sequence using the first PCR-primer and a second AFLP primer used for AFLP 
t^plate preparation, and optionally designing and synthesizing a second PCR-primer for 
the sequence flanking the splice site sequence at the end. 

[67]. The method according to the invention involves the identification of a splice site 
5 associated polymorphic fragment, e.g. as described hereinbefore. This polymorphic 

fragment (e. g. a fragment amplified \xsing the combination of a S3P primer and an AFLP- 
primer for the first restriction enzyme used for AFLP template preparation and optionally 
one or more alleles thereof) is isolated (e. g. cut out of the gel obtained after 
gelelectrophoresis) and sequenced (step 1 in Figure 4). For sequencing, the gel-excised 
1 0 fragments may be cloned in convenient sequencing vectors. Alternatively, the gel-excised 
fragments are re-amplified in a PGR using the S3P-primer used in the original 
amplification, and a modified version of the AFLP primer used in the original 
amplification. The modified AFLP-primer preferably contains an additional sequence at its 
5'-end that may conveniently be used for priming subsequent sequencing reactions. A 
1 5 convenient example of such additional sequence for priming sequencing reactions is 
sequence of the universal Ml 3 sequencing primer. 

[68]. Based on the sequence of the polymorphic fragment (s) thus obtained, a suitable 
PGR primer is selected/designed from the sequence flanking the splice site sequence at the 
3 -end. Next, this PCR-primer, in combination with an AFLP-primer corresponding to the 

20 second enzyme used for AFLP template preparation, is used for the amplification of a 
fragment that contains the spilice site and an additional 5*flanking sequence that is 
downstream from the splice site with respect to the first PGR primer. This additional 5- 
flanking sequence was not present in the polymorphic band initially chosen for sequencing 
(step 2 in Figure 4). The additional S -flanking sequence is used as basis for the design of a 

25 suitable second PCR-primer (step 3 in Figure 4 which togeth^ with the first PGR primer 
matching the 3*-flanking sequence is suitable for use in a conventional PGR-detection, e. g. 
on the starting DNA (step 4 in Figure 4) 

[691. The present invention provides for a reliable and powCTfiil method for the 
generation of PGR primers. The PGR primers obtained according to the invention 
30 preferably are suitable for use in conventional PGR-technology and more preferably are 
suitable PGR primers for use in conventional assays based on flanking PGR primers, 
whereby the splice sites have been identified using splice site AFLP technology. Hence, 
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splice site AFLP provides a valiiable technique for rapid and reliable identification of 
polymorphic splice sites, A further advantage is that the possibility is provided for the 
selective enrichment of nuclear coded sequences or organelle coded sequences by using the 
3 'end, which is a distinct adyknteg^. of fhe present invention over the conventional 
5 techniques. . ^ ^ • - 

[70]. In a preferred emb^jdm the steps indicated as optional are also 

included in tUe'lnil^^ a second PGR primer is designed for the 

development of i|ii|a^^^ PGR. The skilled person will appreciate 

that alternative mdkbds ekist foi: obtaining the additional flanking sequence based on 
1 0 which the second PGR prinief will be designed, in addition to the method based on the 

second.A^Lj^i^^i^ei;^^ in the present application. Such mettiods e.g. 

led by inverse PGR 




A'^flmi^g^e^ prese&t invention refers to a sequence adjacent 

to a splice site seqti^ sequence will usually be defined by the 

1 5 distance between ia splice site sequence and another sequence, e.g. another splice site- 
specific sequence or a sequence designated or suitable as PCR-primer or AFLP primer and 
the like. The length of a flatpking sequence gen varies between 0 and 500 nucleotides, 
preferably up to 250, more preferably \xp to 150 and most preferably up to 100 nucleotides. 
The upper limit will getiefally be governed by factors such as the resolution of the gel and 
20 the length of the splice site d^yed fragment. 

[72]. In a further aspect, the invention pertains to a method for flie determination of a 
PGR-primer, comprising ^psi^s of : 

- restrictulg a tiu^i|bic[ aqid sisquence with a restriction endonuclease to provide a 
mixture of f^^^tibii dfi^ 

25 - ligating the restriction fragments thus obtained to a adapter ; 

amplifying the mixture of adapter ligated restriction fragments thus obtained 
with a S3P-primer and a first AFLP primer to provide a mixture of amplified 
restriction fragments; 

- detecting at least one of the amplified restriction fragments thus obtained; 
30 - identifying a splice site-associated polymorphic fragment or band; 

- determining the sequence of the polymorphic fragment or band; 

- designing a first PGR-primer for the sequence flanking the splice site sequence 
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at the 3'end; 

" optionally amplifying a fragment comprising the splice site and at least part of 
the 5-flanldng sequence using the first PCR-primer and a second AFLP primer; 
- optionally designing a second PCR-primer for the sequence flanking the splice 
5 site sequence at the 5'end. 

[731- The invention further relates to primers obtainable by the present invention in the 
development of an assay, preferably for the analysis of spUce sites. 

[74]. The invention further relates to the use of (the combination of) a S3P primer and an 
AFLP primer in the development of PCR-primers, preferably suitable for use in splice site 
10 assays. 

[75]. The polymoiphic fragment which is used to determine a suitable PCR-primer is 
preferably derived from genomic DNA; and in particular eukaryotic genomic DNA or (a 
mixture or a library of) recombinant DNA clones e.g. derived from a plant, animal or 
human being. 

1 5 [76]. The invention also relates to the use of a PCR-primer according to the present 

invention in the development of an assay, preferably for the anal>^is of splice sites and to a 
kit comprising means for obtaining a PCR-primer according to the invention, as well as to 
a kit comprising a PCR-primer according to the invention. 

[77]. Furthermore, one or more of the spUce site-associated polymorphic fragments 
20 identified by the method of the invention is isolated and optionally sequenced, and is used 
to generate a nucleotide sequence representative for the spUce site -associated marker for 
use in -for instance- an array for the analysis of nucleic acid sequences. 
[78]. In yet another aspect, the invention relates to the use of a S3P-primer in the 
methods described hereinabove. The invention also relates to the use of an AFLP primer in 
25 the methods described hereinabove. 

[79]. In another aspect, the invention relates to the use of a combination of a S3P primer 
and an AFLP-primer in analyzing a nucleic acid sequence. In particular, this aspect of the 
invention relates to tiie use of the combination of a S3P-primer and a AFLP-primer in 
analyzing a nucleic acid sequence for the presence of polymorphisms associated with 
30 splice sites. 

[80]. Yet another aspect comprises any data generated by the method of the invention, 
optionally on a suitable data carrier, such as paper or a computer disk. Such data may for 
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instance include flie generated DNA-fingeiprints (e. g. in the form of a gel) and/or 
autoradiographs/photographs or other reproductions theteoi^ as weU as (stored) analogous 
or digital data thereon, e. g. in the form of a database. 

1811. The invention also comprises kits for use in the invention, the kits at least 
5 comprising a S3P-primer and an AFLP-primer; and usually also comprising an adapter 
complementary to the AFLP-primer. These kits can further contain any known component 
for such kits, including but not limited to components known per se for AFLP kits, such as 
restriction enzymes (in which case the adapters are preferably suited to be ligated to the 
restricted sites generated with the enzyme); a polymerase for ampUfication, such as Taq- 
0 polymerase ; nucleotides for use in primer extension; as weU as buffers and other solutions 
and reagents; manuals, etc.. Further reference is made to the European patent ^Ucation 0 
534 858, incorporated her&n by reference. 



15 



Description of the Figures 
[82]. Figure 1 is a schematic rqwresentation of the mefliod of the invention. In Figure 1, 
1 depicts a S3P primer, 2 is the double stranded target DNA (restriction fiagment). 3 is the 
spUce site, the intron part of the splice site is indicated as 3A, flie exon part as 3B, 4 is the 
part of the S3P primer located at the 3 ' end and 5 is the 5 ' end. The AFLP primer is (7) 
contains a part (10) that is complementary to the adapter (8) ligated to the restriction 
20 fragment and may contain selective nucleotides at the 3 ' end (9) 

[83]. Figure 2 and 2A are a exemplary representation of a consensus sequence of a spUce 
site in combination with a target sequence and two primers, one mismatching on the target 
sequence and one matching primer, thereby introducing selective nucleotides in the S3P 
Primer. Figure 2A is an exemplary representation of a consensus sequence of a spUce site. 
[841. Figure 3 is an AFLP-fingetprint generated with a spUce-site specific primer in 
combination with an AFLP primer. PGR profile A: 30 s at 94 "C + 13 *(30 seconds at 65 
"C, 0.7«>C/cycle Touch Down) + 60 seconds at 72 °C; 30 s at 94 "C + 23 *(30 seconds at 50 
''C) + 60 seconds at 72 "C; PGR profile B: 30 s at 94 "C + 13 *(30 seconds at 65 ''C, 
0.7''C/cycle Touch Down) + 60 seconds at 72 '^C; 30 s at 94 "C + 23 *(30 seconds at 56 °C) 
+ 60 seconds at 72 ^C; PGR profile C: 30 s at 94 °C + 13 *(30 seconds at 45 "C, 1 »C/cycle 
Touch Up) + 60 seconds at 72 "C; 30 s at 94 + 23 *(30 seconds at 50 °C) + 60 seconds 
at 72 °C; 



25 



30 
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[85]. Sections 1-6 are based on +0/+0 AFLP preamplification with primer combination 
SSPn/AFLP+0. Sections 7-13 are based on +0/+0 AFLP preamplification with primer 
combination SSPn/AFLP+1. Sections 14-18 are based on +1/+1 AFLP preamplification 
with primer combination SSPn/AFLP+3. 

5 [86]. Figure 4 is a schematic r^sentation of the conversion into a PGR assay. 

[871. Figure 5 is a representation of a splice-site AFLP screening on Arabidopsis RIL8 
and tomato parental line samples. The left panel represents Arabidopsis RIL8 sample 
screening using a SpUce site primer SSP9 and Msel +0 primer (MOOk) combination on 
template generated using EcoRI and MseL Lane 13 represents the lObl size marker, lane 

0 14 and 15 represent the parental lines 1 and 2, respectively. The right two panels represent 
the screening of SpUce site primers SSP3 and SSP9 combined with an AseI+1 primer usini 
Asel templates of tomato par^tal lines. 



15 £]cample 1 



[88]. DNA fix>m the Arabidopsis lines Landsberg erecta and Columbia was used to generate 
AFLP fingerprints by use of a spKce-site-specific primer (S3P primer in combination with an 
+0, +1, +2 or +3 EcoRI or Msel AFLP primer. AFLP-reactions with 12 diflferent spUce-site 

20 specific primers (Table 4) in combination with 10 different AFLP primers were performed on 
AFU> restriction-Ugation mixture, +0/+0 or +1/+1 AFLP preampUfication product. 
189]. Three different PCR-profiles were used for the ampUfication of the fragments. 
AFLP fragments obtained with were excised out of PAA-gels (56 AFLP-marker bands and 
4 constant bands) and reamplified. Twelve markers were cloned by use of the Original TA 

25 Cloning Kit (Invitrogen) and 32 clones were sequenced on the MegaBACE. To find out if 
coding regions are preferentially ampUfied by SpUce-site AFLP, the presence of coding 
regions in sequences of AFLP fragments obtained with SpUce-site AFLP and sequences of 
Arabidopsis EcdBJJMsel +2/+3 AFLP markers, obtained with the standard AFLP 
procedure was detennined by a BLAST-search, performed with the PEDANT-software of 

30 Biomax (Martinsried, GrCTmany). 

[90]. To avoid sequencing problems that may occur when fragments are excised out of 
PAA-gels with dense fingerprint patterns because multiple fragments may migrate at the same 
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position and the fragments are not always excised fi»m the PAA-gel wifliout cantamination 
by other fragmaits, a cloning method was used . By cloning fliese excised fragments, pure 
fragments are obtained and sequraidng these clones prevents these problems. 

5 RESULTS 

Fingerprint results 

(911. Fingerprints generated by use of a splice-site-specific primer (S3P primer) in 
combination with a +0, +1, +2 or +3 Ecom or Msel AFLP primer and the PCR-profiles 
1 0 used for the ampUfication are shown in Figure 3 . Splice-site AFLP markets are marked by 
arrows. 

[92]. PGR profile A: 30 s at 94 *»C + 13 *(30 seconds at 65 "C, 0.7°C/cycle Touch Down) 
+ 60 seconds at 72 "C; 30 s at 94 "C + 23 *(30 seconds at 50 °C) + 60 seconds at 72 "C; 
PGR profile B: 30 s at 94 °C + 13 *(30 seconds at 65 "C, 0.7'»C/cycle Touch Down) + 60 

15 seconds at 72 °C; 30 s at 94 "C + 23 *(30 seconds at 56 + 60 seconds at 72 **C;PCR 
profile C: 30 s at 94 + 13 *(30 seconds at 45 1 °C/cycle Touch Up) + 60 seconds at 
72 **C; 30 s at 94 + 23 *(30 seconds at 50 °C) + 60 seconds at 72 "C; 
(931. Sections 1-6 are based on +0/+0 AFLP preamplification with primer combination 
SSPn/AFLP+0. Sections 7-13 are based on +0/+0 AFLP preamplification with primer 

20 combination SSPn/AFLP+1 . Sections 14-18 are based on +1/+1 AFLP preamphfication 
with primer combination SSPn/AFLP+3. 

Sequence analysis 

25 [941. After a BLAST search against a public eukaryotic genomic DNA data set, 2 hits 
with predicted genes were found (28.5%) (Table 2). A BLAST search perfonned with 132 
Arabidopsis AFLP firagment sequences against this eukaryotic genomic DNA data set 
resulted in 10 hits with predicted genes (7.5%). This demonstrates that coding regions are 
preferentially amplified. 

30 

Table 2: Hits of predicted genes found for 2 splice-site AFLP firagments in the BLAST- 
search: 
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Code 


Contig 


Start 


Stop 


Description 


Best BLAST hit 


^tlD 


e-val 


senel 


Gl 

F508.850 


95 


280 


Predicted 
gene 


Hypothetical protein F23A5.3 
[imported] - Atabidopsis thaliana 


PIR:B9683 
9 


9e-32 


eenel 


El 

F524.480 


372 


12 


Pre^lipted 
gene 


Phosphatidylinositol 3-kinase 

[imported] - Arabidopsis thaliana 


PIR:B9663 
0 


2e-47 



(951. The cp^ic^iiptf^^^^ by use of a splice-site specific primer in 

combination y/i0g detectable when Splice-site AFLP was 

performed on AFLiP t)reiampliiScatibn product. Using splice-site specific primers in combination 

5 with +0 AFLP primers and +0/+0 Aj^LP preampUfication product, dense fingeiprints were 
generated, Five.di^^^ primers used in combination with one +3 AFLP 

P"™^ resulted in completely reproducible fingerprints. Thij 

in^oiies^Mm^^ is based on the splice-site specific 

seqiie^^^ comparable with a- standard +2/+3 AFLP fingerprini 

1 0 (14%). The three waliiat^ PCSR proffl produced nearly the same fingerprints. 

Example! . 

[96]. In this Example, it w^s the objective to enrich fingerprints for genie regions (intron 
or exon sequences) of the' genome 6h a larger scale than in example 1 . The targeting 

1 5 efficiency was determined by sequencing of splice-site PCR firagments followed by 

homology searches. Furthermore, several splice-site PCR primers were tested on tomato 
parental lines. The 12 se^qted JUtid splice-site primers fi-om the previous example 

were used to determincf &;b-;o|f>ti^^ combination and the optimal 

amplification prpffle- The^^^W^ pn Arabidopsis sequences but were also usable 

20 to generate fingerpijnts iin tondato. An pxample of fingerprints generated using splice-site 
primers on Arabidopsis and tomato is shown in Figure 5. This example clearly shows the 
segregation of markers in the RIL8 pbpulatioiL Furthermore, it shows that the splice-site 
primers can be used tp generate reproducible fingerprints in tomato. 

25 Table 3, An overview of the number of scores markers in each primer combination related 



to ttie splice site region in Arabidopsis (template EcoRI/Msel). 



Primer 


Code 


Scored markers 


Two-fits 
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combination 








SSPl/MOOk 


QK1A074R 


27 


24 


SSP3/M00k 


QK1A085L 


14 


8 


SSP5/M00k 


QK1A063R 


26 


24 


SSP6/M00k 


QK1A077L 


18 


13 


SSP7/M00k 


QK1A070R 


18 


18 


SSP9/M00k 


QK1A070L 


38 


32 




TOTALS 


141 


119(84%) 



[97], Table 3 shows for splice-site linker-PCR that from a total of 141 scored markers 
1 19 rendered a two-fit Le. that band intensities of a particular fragment are observed in 2 
groups representing presence or absence of the fragment respectively, which means that 
5 84% of the scored markers could be dominantly scored. This percentage is in agreement 
with that of higji quality AFLP fingerprints. Co-dominant scoring is not applicable to a 
RIL8 population, but when using a population containing heterozygotes co-dominant 
scoring is possible. To detemiine the targeting efficiency 160 splice-site PGR fragments 
were isolated from polyacrylamide gels with fingerprints using the SSPl, SSP3 or SSP9 

10 primer and subsequently sequenced. From these sequences, 100 sequences of good quality 
were used for homology analysis, which rendered 24 sequences with homology to 
Arabidopsis protein sequences, which is 24% of the firagments. Of these 100 fragments, 
76% gave homology at DNA level. When using only the sequences that displayed 
homology at DNA level, the targeting efficiency of genie regions m the Arabidopsis 

15 genome would be 24 out of 76, which is 32%. Compared to the targeting efficiency of 

genie regions using regular AFLP of 7.5% (Example 1), the targetmg efficiency is raised 3 
to 4 fold. New primers were designed using sequences of splice-site PCR Augments that 
rendered a homology with proteins from the database. These primers where designed to be 
dfrected into the intron of the protein, whereas the splice-site PCR fragment sequences was 

20 directed into the exon. These new SSP primers also generated fingerprints that could be 

readily used for genotyping. A total overview of available and tested Splice-site primers is 
shown in Table 4. 



SUBSTITUTE SHEET (RULE 26) 



wo 2005/003393 PCT/NL2004/000471 

- 31 - 




SUBSTITUTE SHEET (RULE 26) 



wo 2005/003393 



PCT/NL2004/000471 



- 32 - 



SSPl fiagment 


SSPl fragment 


SSPl fragment 


SSPl fragment 


SSP9 fragment 


SSP9 fragment 


ISSPI-3'CTG 


SSP5-3'CT 


SSP7-3'CTGA 




ElO 


F04/05 


F08 


FOP 


GOl 


G02 










Exon -> Intron 


Exon -> Intron 


^on-> Intron 


Exon -> Intron 


Exon -> Intron 


Exon -> Intron 


fritron -> Exon 


Intron -> Exon 


Intron -> Exon 




ATGTCGTCAATGCAGGTAAG 


TGTCTCTGAGTGCAGGTAAG 


CTCAGAATTTTCCAGGTAAG 


AAGAAAACACAGCAGGTAAG 


TCGAATTGTCACCTAACCTG 


TCGCTGCACTCCTTAACCTG 


CGTCATGCATGACACTTAC 


CTGACTGATAAGCGACTTAC 


CTGACTCGATTCAGACTTAC 


GATGAGTCCTGAGTAA 


02Y056 


02Y057 


02Y058 


02Y059 


02Y060 


02YO61 


02Y062 


02Y063 


02Y064 




SSP 16 


SSP 17 


SSP 18 


SSP 19 


SSP 20 


SSP 21 


SSPIB 


SSP5B 


SSP7B 


MOOk 


VO 




00 


ON 
I— 1 


o 






CO 
CS 
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